Picture for Bole Ma

Bole Ma

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

Add code
May 31, 2026
Viaarxiv icon

Leyline: KV Cache Directives for Agentic Inference

Add code
May 31, 2026
Viaarxiv icon

Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory

Add code
May 20, 2026
Viaarxiv icon

The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures

Add code
May 12, 2026
Viaarxiv icon

Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving

Add code
May 07, 2026
Viaarxiv icon

Pupil Design for Computational Wavefront Estimation

Add code
Mar 31, 2026
Viaarxiv icon

Virtual Width Networks

Add code
Nov 17, 2025
Figure 1 for Virtual Width Networks
Figure 2 for Virtual Width Networks
Figure 3 for Virtual Width Networks
Figure 4 for Virtual Width Networks
Viaarxiv icon

Truncated Proximal Policy Optimization

Add code
Jun 18, 2025
Figure 1 for Truncated Proximal Policy Optimization
Figure 2 for Truncated Proximal Policy Optimization
Figure 3 for Truncated Proximal Policy Optimization
Figure 4 for Truncated Proximal Policy Optimization
Viaarxiv icon

Model Merging in Pre-training of Large Language Models

Add code
May 17, 2025
Viaarxiv icon

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Add code
Apr 08, 2025
Figure 1 for VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Figure 2 for VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Figure 3 for VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Viaarxiv icon